Computational drug repositioning with attention walking

Drug repositioning aims to identify new therapeutic indications for approved medications. Recently, the importance of computational drug repositioning has been highlighted because it can reduce the costs, development time, and risks compared to traditional drug discovery. Most approaches in this area use networks for systematic analysis. Inferring drug-disease associations is then defined as a link prediction problem in a heterogeneous network composed of drugs and diseases. In this article, we present a novel method of computational drug repositioning, named drug repositioning with attention walking (DRAW). DRAW proceeds as follows: first, a subgraph enclosing the target link for prediction is extracted. Second, a graph convolutional network captures the structural features of the labeled nodes in the subgraph. Third, the transition probabilities are computed using attention mechanisms and converted into random walk profiles. Finally, a multi-layer perceptron takes random walk profiles and predicts whether a target link exists. As an experiment, we constructed two heterogeneous networks with drug-drug similarities based on chemical structures and anatomical therapeutic chemical classification (ATC) codes. Using 10-fold cross-validation, DRAW achieved an area under the receiver operating characteristic (ROC) curve of 0.903 and outperformed state-of-the-art methods. Moreover, we demonstrated the results of case studies for selected drugs and diseases to further confirm the capability of DRAW to predict drug-disease associations.


Methods
In this section, we introduce our computational drug repositioning method, DRAW, which predicts drug-disease associations through the following steps.First, a drug-disease heterogeneous network is constructed in the form of an undirected, unweighted graph.Second, the subgraph enclosing the target link for prediction is extracted.This is used as the input for the binary graph classifier.Third, nodes in the subgraph are labeled as the Dual Radius Node Labeling (DRNL) scheme.Fourth, a GCN is adopted to extract the structural features of the subgraph.Fifth, a random walk profile is computed using attention mechanisms.Sixth, the features of the node-, edge-, and graph-level are calculated using the random walk profile.Finally, the graph classifier takes all features as inputs to score the subgraph for the purpose of predicting the presence of the target link.Overall, we enhanced the methodology to be applied to drug repositioning, incorporating refined network construction and parameter tuning into the original process of WalkPooling 22 .Figure 1 shows the workflow of DRAW.www.nature.com/scientificreports/

Network construction
A drug-disease heterogeneous network is structured by connecting drug pairs, disease pairs, and known drugdisease associations.However, including all drug-drug similarities and disease-disease similarities requires considerable memory usage and time.Thus, we selected drug pairs (and disease pairs) in order of the highest similarities as edges and adjusted the network density.We set the hyperparameters D dr and D di to represent density thresholds of the drug and disease networks, respectively.For the drug network, node pairs are selected up to 0.01 × D dr × N dr × (N dr − 1) ÷ 2 where N dr is the number of drugs.The disease network is the same as the drug network.In our experiment, we set densities D dr and D di to 4, empirically.The effect of network densities will be further discussed in the Results section.All edges in this heterogeneous network were weighted as 1 to create an unweighted graph.

Subgraph extraction
To extract the subgraph enclosing the target link for prediction, a k-hop subgraph was sampled.According to previous reports 22,26 , the existence of a link between two nodes depends on the connectivity of their close neighbors.For instance, link prediction using the Jaccard index showed good performance, even though it considered only the closest neighbors.In agreement with these studies, we adopted k = 2 in our experiment.When a subgraph is extracted, the link prediction problem can be redefined as a graph classification problem.If a graph is classified as true, then the target link is predicted positively and vice versa.

Node labelling by DRNL
To create topological features of the subgraph, we used the DRNL algorithm proposed by Zhang and Chen 26 .
The result involves the extent to which each node is separated from the target nodes connected by the target link.This is formulated as follows: where i is the node to be labeled, x and y are the target nodes, d x and d y represent the distance between i and x and between i and y , respectively, ) is the quotient divided by 2, and ( d%2) is the modulo operation divided by 2. If i cannot reach x or y , then i is assigned a null label, 0. This method labels x and y as 1.The node labels are converted into one-hot encoding vectors.

Feature extraction by GCN
We adopted a GCN 21 to capture the structural features of the labeled nodes in the subgraph, which resulted in node representations.The GCN extracts not only local features but also global features, allowing us to deal with diverse features in the topology.The GCN updates hidden states as follows: where X (l+1) is the result of the l th GCN layer in the matrix form of the number of nodes by the number of features, A is the adjacency matrix of the input subgraph, X (l) is the results of the previous GCN layer, σ is the activation function, A is the adjacency matrix adding self-loops calculated in the form of A = A + I , I is the identity matrix, D is the diagonal matrix whose elements represent node degrees, W (l) is the trainable matrix at the l th layer.

Random walk profile generation by attention
A random walk algorithm computes transition probabilities of nodes based on edge weights in a graph.Unlike conventional methods, we used attention mechanisms to quantify which specific nodes will be emphasized based on the connectivity between nodes.The resulting attention scores improve the quality of the transition probabilities.The attention score ω i,j between nodes i and j is calculated as follows: where Q θ is the query function, K θ is the key function, z i and z j are the features of i and j computed from the GCN layer, respectively, and N att is the number of output dimensions of attention.This equation is also known as the value function in attention mechanisms.The attention scores between the nodes from this equation are encoded into the transition matrix P .The ( i , j)-element of P , p i,j , which indicates the transition probability from i to j, is computed as follows: where N (i) is the set of neighbors of i .p i,j can be defined when i and j are linked.If i and j are not linked, then p i,j is zero.This framework adopts multi-head attention, as has been used in most previous studies for attention mechanisms.We applied 2-head attention in our experiment.The τ-th power of P , P τ , refers to the probability that a random walker will arrive at a walk of length-τ from node to node.We gather node-level, edge-level, and graph-level features for the random walk profile at length-τ as follows: where x and y are the nodes linked by the target edge for prediction.Node-level features represent loop struc- tures around x and y .Because we are dealing with undirected graphs, the summation of the node-level features guarantees that they are independent of the ordering of x and y .Edge-level features describe the probability of a random walker reaching a target edge.Graph-level features are related to the self-loop probability of all nodes in the graph.Node pairs without an edge are always considered negative for training and prediction.More precisely, during the training phase, a target link must always be present for a positive sample to be learned, and a target link must always be absent for a negative sample to be learned.Predictions, however, have to be made without a target link at all times during the prediction phase.This unfavorable situation typically causes overfitting.Thus, we train two distinct types of graphs: one includes the target link for prediction, while the other excludes the target link.These two different graphs are represented as G + and G − , respectively.This data augmentation technique was applied to prevent overfitting.The attention mechanisms that are conducted in the two graphs, G + and G − , create transition matrices P + and P − and random walk profiles including node τ ,+ , node τ ,− , edge τ ,+ , edge τ ,− , graph τ ,+ , and graph τ ,− .However, a trace operation for graph-level features disturbs the structural information around the target link, making it unsuitable for link prediction.To solve this problem, WalkPooling 22 used the "background subtraction" technique defined as �graph τ = graph τ ,+ − graph τ ,− .Finally, the features are concatenated as: where G is the subgraph enclosing the target link, Z is the features of G computed by GCN, and τ c is the maximum walk length.In our experiment, we applied τ c = 7 as a default.This equation is employed for each attention head.In the case of adopting multi-head attention as mentioned above, the resultant feature space forms the size of num_heads × ((5 × τ c ) + 1).

Subgraph classification by MLP
The random walk profiles computed in the above steps are fed into a multi-layer perceptron (MLP) to predict whether a target link exists.An MLP consists of an input layer, four hidden layers, and an output layer.The ReLU function was adopted for activation through the hidden layers, and the sigmoid function was adopted for activation in the output layer.

Drug similarity networks
DrugBank 27 was used to collect the drug datasets.This database includes a wide range of drug-related features, such as drug indications, drug targets, chemical structures, and drug-drug interactions.In this study, similarities between drugs were calculated using two salient characteristics: chemical structures and ATC codes.First, drug structural similarity was determined using simplified molecular-input line-entry specification (SMILES) 28 , which is a line notation system used to represent chemical compound structures.The Chemistry Development Kit (CDK) 29 was employed to convert a pair of structures in SMILES format into a Tanimoto similarity score.Second, the similarities between drugs were measured based on their ATC codes 30 , a system to classify drugs in a hierarchy of pharmacological, therapeutic, and chemical categories.The similarity between ATC codes was calculated as follows: where ATC indicates each ATC code, and C(ATC) is the set of codes from all ATC levels.It is noted that a drug may have multiple ATC codes, thus we used the average similarity of all ATC code pairs to calculate the similarity between drugs as follows: where dr represents each drug, X and Y indicate the sets of ATC codes of each drug, and |X| and |Y | is the size of X and Y , respectively.Each of the two drug similarity networks was merged with the disease similarity network and drug-disease associations to create two different heterogeneous networks, named network-CS and network-ATC.

Disease similarity networks
Online Mendelian Inheritance in Man (OMIM) 31 is an extensive collection of human genes and genetic diseases.It is continuously updated with a focus on disease-associated genes.To measure similarities between diseases, many previous studies have used MimMiner 32 , which provides a convention for representing phenotype networks.However, to analyze and quantify the relationships between diseases more accurately, we employed an ontology, which is a conceptual representation of entities with a standardized structure that links them based on the relationships between their meanings.We used Human Phenotype Ontology (HPO) 33 , a comprehensive phenotype ontology consisting of phenotypic abnormality terms linked by parent-child relationships.HPO also provides human disease annotations originating from the OMIM 31 , OrphaNet 34 , and DECIPHER 35 databases.For our experiment, diseases from OMIM were extracted from HPO annotations.We adopted an approach to measure semantic similarity as suggested previously 36 .The following describes the calculation of the semantic similarity between two diseases: www.nature.com/scientificreports/where d 1 and d 2 are diseases annotated to HPO terms and T(d) is the set of HPO terms annotating d , Note that a single disease can be annotated to multiple HPO terms.Finally, sim T denotes the semantic similarity between the two HPO terms C i and C j , which is calculated as follows: where A t (C) is a set of ancestor terms of C and P(C) is the ratio of annotations as the number of annotations to C over the number of annotations to all terms in HPO.

Drug-disease associations
The Cdataset 37 is one of the most widely used benchmark datasets for drug-disease associations in recent drug repositioning research.This benchmark, which contains 663 drugs, 409 diseases, and 2352 drug-disease associations, is an upgraded version of the initial ground-truth, Fdataset 38 , and was created by appending clinically validated data from the DNdataset 10 .From the Cdataset, we selected only drugs and diseases available in the constructed drug and disease similarity networks.Finally, 659 drugs, 285 diseases, and 1728 drug-disease associations were selected for network-CS, and 636 drugs, 285 diseases, and 1681 drug-disease associations were selected for network-ATC, as shown in Table 1.

Experimental setting
The predictive results of drug-disease associations were assessed individually on the drug side and disease side.Prediction on the drug side identifies new diseases that each medication could treat, whereas prediction on the disease side identifies medications with the potential to treat each disease.We applied 10-fold cross-validation for this assessment.The folds were evenly divided based on the number of drugs, diseases, and their associations to ensure impartial analysis.
While the sigmoid function in the output layer is typically applied for binary classification, for ranking potential target links, we opt not to use the sigmoid activation, thereby leveraging the model's raw scores.To maintain the integrity of the training process and ensure a balanced representation, negative samples were selected to mirror the quantity of positive training data, explicitly excluding any instances from the test dataset.The optimization of this model is guided by minimizing the binary cross-entropy loss function.
The AUC was used to compare predictive performance.An ROC curve was created by plotting the true positive rate (TPR) against the false positive rate (FPR) as the threshold settings were changed.The AUC is typically regarded as the most effective metric for quantifying predictive power.We also used AUPR*, a transformed version of the area under the precision-recall curve (AUPR) described previously 11 .The precision-recall curve plots precision against recall as the threshold settings change.However, drug-disease associations were remarkably sparse in our experimental dataset.This generally causes very low precision because of the extremely large number of false positives (FP) compared to true positives (TP), where precision is TP/(TP + FP) .To resolve the biased results from AUPR, we adopted AUPR* using precision* instead of precision, where precision* was defined as TPR/(TPR + FPR).

Predictive accuracy comparison
Recent network-based approaches for drug-disease association prediction can be divided into three categories: graph-mining algorithms, matrix factorization, and deep learning models.In this section, we compare the predictive performance of DRAW with that of five state-of-the-art methods: three methods using deep-learning models (deepDR, ANMF, and LAGCN), and the most recent methods in the other two categories (BGMSDDA and MSBMF), as listed below.The best hyperparameter values recommended in the previous studies were used to implement each method.To compare the predictive performance, we applied 10-fold cross-validation for all the methods.
• BGMSDDA 39 applies a graph diffusion technique to a bipartite graph that integrates multiple similarities using Gaussian interaction profiles.• MSBMF 40 is based on bi-linear matrix factorization using multiple similarities as latent features.
• deepDR 41 constructs multiple positive point-wise mutual information (PPMI) matrices from multiple sources, and applies a multi-modal deep autoencoder (MDA) to combine these matrices.
Table 1.Statistics of the two datasets that we used for our experiment on drug-disease association prediction.

Number of drugs Number of diseases Number of associations Sparsity
Network www.nature.com/scientificreports/ • ANMF 42 uses an autoencoder with the similarities including Gaussian noise to extract the features of drugs and diseases.• LAGCN 43 adopts the attention mechanism for layers in a GCN to predict drug-disease associations.
Table 2 shows the predictive accuracy of the selected methods in terms of AUC and AUPR* when network-ATC was used.The proposed method, DRAW, had the highest AUC and AUPR* in both drug and disease side predictions.On the drug side, the AUC of DRAW was 0.903, which was 2.5% higher than that of the secondranked BGMSDDA.DRAW also had the highest AUPR* score (0.915).On the disease side, the gap between DRAW and the second class widened.DRAW achieved an AUC of 0.807, which was 8.8% higher than that of the second-ranked LAGCN, and an AUPR* of 0.807, which was 7.7% higher than that of the second-ranked BGMSDDA. Figure 2 shows the ROC and precision*-recall curves for predicting drug-disease associations with network-ATC.These curves verify that DRAW was superior to the other methods, especially for disease side prediction as shown in Figs.2c and d.
Table 3 shows the predictive accuracy of the selected methods in terms of AUC and AUPR* when network-CS was used.For disease side prediction, DRAW performed better than the other methods.DRAW achieved an AUC of 0.752 and an AUPR* of 0.784, which were 8.3% and 3.2% higher, respectively, than those of the second-ranked BGMSDDA.However, for drug side prediction, DRAW had slightly lower accuracy than BGMSDDA, which is a graph-mining algorithm, and MSBMF, which is a matrix factorization algorithm, in terms of both AUC and AUPR*.Figure 3 shows the ROC and precision*-recall curves for predicting drug-disease associations with network-CS.Figure 3c and d show that DRAW was more accurate than the others for disease side prediction, whereas BGMSDDA was better than DRAW for drug side prediction in Fig. 3a and b.
Overall, DRAW outperformed the competitive methods.However, the previously proposed deep learning methods generally showed slightly lower accuracy than the graph mining and matrix factorization algorithms, particularly for drug side prediction.Typically, approaches based on deep learning models are highly sensitive to the quality and quantity of input data.For this reason, the deep learning methods, including DRAW, showed relatively low predictive accuracy on the drug side with network-CS, although DRAW always achieved first place among the deep learning methods selected for this experiment.

Effects of network density
In network-based approaches for drug-disease association prediction, the density of input networks significantly affects predictive accuracy.We assessed the effect of the density of the drug and disease networks in our 10-fold cross-validation experiments.Figure 4 shows the distributions of AUC values from 10 folds when DRAW predicts drug-disease associations on the drug side with network-ATC.The four boxplots in this figure show the results when the network density thresholds were 2%, 3%, 4%, and 5%.The highest median AUC was achieved when the density threshold was 4%, indicating that the densities of both the drug and disease networks were 4%.When the density threshold is 5%, the median AUC decreased, and AUC values were more widely dispersed, including an AUC lower than 0.86 as an outlier.Selecting a higher density threshold implies that the input network contains more edges with lower similarity scores.It can be verified that selecting a density threshold higher than 4% negatively affects predictive performance.Therefore, we used a density threshold of 4% in our experiments.

Case studies
In this section, we present exploratory case studies of drug repositioning for specific drugs and diseases.We created a training set including all known drug-disease associations and a test set comprising the other drug-disease pairs in network-ATC.DRAW learned the training set to extract features and computed the prediction scores for drug-disease pairs in the test set.For each drug, all diseases were listed in descending order of their prediction scores.For each disease, all drugs were administered in the same manner.Finally, we validated the results using publicly available databases, such as The Comparative Toxicogenomics Database (CTD) 44 , DrugBank 27 , and KEGG 45 .Tables 4, 5, 6 and 7 show the high-ranked prediction results for selected drugs: doxorubicin, gabapentin, levodopa, and flecainide.From these results, we identified 10 evidences out of 10 for doxorubicin, 8 out of 9 for gabapentin, 6 out of 9 for levodopa, and 6 out of 8 for flecainide.Tables 8, 9, 10 and 11 show the high-ranked prediction results for selected diseases: type 2 diabetes mellitus, ischemic stroke, Alzheimer's disease, and Parkinson's disease.From these results, we identified 9 evidences out of 10 for type 2 diabetes mellitus, 6 out of 9 for ischemic stroke, 8 out of 10 for Alzheimer's disease, and 8 out of 10 for Parkinson's disease.The drugs listed in Table 8 serve therapeutic purposes across various medical conditions, including gastrointestinal disorders (ranked 2, 6 and 7), hypercholesterolemia (ranked 3 and 8), and obesity (ranked 4, 5, and 9).Canagliflozin, initially indicated for type-1 diabetes and occupying the first rank, has been used for type-2 diabetes management.Recent advancements have extended the utility of several medications originally intended for diabetes treatment, such as liraglutide and semaglutide, to address obesity.These results provide insights into the appropriateness of the pharmaceuticals listed by our model, in accordance with evolving therapeutic trends.www.nature.com/scientificreports/The drugs to treat ischemic stroke in Table 9 belong to several groups, including blood pressure management, antithrombotic agents, antiplatelet medications, and vasodilators.These medications play a crucial role in reducing the risks associated with ischemic stroke, which occurs when blood flow to the brain is blocked or reduced.The strong correlation between these drugs and ischemic stroke underscores their significance in managing this condition.Furthermore, most of the medications listed in Tables 10 and 11 are known as treating Parkinson's disease and Alzheimer's disease, respectively.These results demonstrate the strong correlation between Parkinson's and Alzheimer's diseases.

Discussion and conclusion
Computational drug repositioning is a promising research area because it remarkably reduces the time, costs, and risk associated with traditional drug discovery.Particularly, network-based computational approaches have widely been applied because they can effectively predict and validate drug-disease associations in a system level.In this article, we presented a novel method to predict drug-disease associations using a drug-disease   www.nature.com/scientificreports/heterogeneous network.Unlike other methods, our approach adopted a random walk algorithm and recalculated transition probabilities based on attention mechanisms.The experimental results revealed that the proposed method outperformed state-of-the-art methods.DRAW also had substantially higher predictive accuracy than the deep learning algorithms proposed previously.The proposed model has several significant advantages.First, it demonstrates efficiency in memory usage by conducting a random walk on a subgraph enclosing each pair for association prediction, rather than on an entire heterogeneous network.Consequently, it is applicable to large networks on an omics scale.Second, our model eliminates the need for re-training even when new drugs or diseases are added because it takes a subgraph as input, regardless of the number of nodes and edges.
Our experimental results showed that all methods performed better for drug side prediction rather than disease side prediction regardless of the input network.This suggests that computational drug repositioning may be better suited for identifying additional diseases that can be treated by new drugs.This result might be obtained because of inaccuracy of the measured similarities between drugs, or the unbalanced numbers of drugs and diseases in the input network.Nevertheless, DRAW had the highest predictive accuracy on the disease side.
All methods in our experiment also performed better with the network constructed by the similarities based on ATC codes rather than that by structural similarities.Utilizing ATC codes for classifying drug-disease associations leverages the therapeutic and pharmacological properties of their active ingredients, offering a more relevant measure of efficacy than chemical structure analysis.Furthermore, because the hyperparameter of network density was optimized on 4\% in our experiment, it was validated that incorporating higher densities, i.e., including connections with lower similarity scores, detracts from the effectiveness of network-based methodologies.In other words, a limited number of drug or disease pairs with high similarities provide sufficient information for drug-disease association prediction.
Several future directions for this research are suggested to enhance the effectiveness of computational drug repositioning.First, multiple biological, therapeutic features regarding diseases and medications can be integrated to improve the predictive accuracy of drug-disease associations.In particular, the integration with additional data of drug-target interactions might have a great influence on association prediction.Because of recent active research of drug-target interaction prediction, the number of open-source databases containing putative drug targets has been rapidly increased, such as DrugBank 27 , BindingDB 46 , SuperTarget 47 , and STITCH 48 .Next, the proposed model might be improved further by discriminating between node types in a drug-disease heterogeneous network.For example, node2vec 49 , one of the most widely used node embedding methods, did not differentiate between node types.However, HIN2vec 50 , an extension of node2vec, facilitated performance improvement by including the features of the graph heterogeneity.

Figure 1 .
Figure 1.The workflow of the proposed model, DRAW.A subgraph composed of the nodes of the target link and their neighbors is extracted from the original drug-disease heterogeneous network.Nodes in the subgraph are labeled by DRNL and fed into the GCN.Transition probabilities P + and P − are computed by the attention conducted on G + and G − with the features from the GCN.After a random walk is completed, all features are used as input of MLP, a binary graph classifier.

Figure 2 .
Figure 2. Accuracy comparison of the proposed method, DRAW, and the five state-of-the-art methods for drug-disease association prediction when network-ATC is used: (a) ROC curves on the drug-side, (b) Precision*-recall curves on the drug-side, (c) ROC curves on the disease-side, and (d) Precision*-recall curves on the disease-side.

Figure 3 .
Figure 3. Accuracy comparison of the proposed method, DRAW, and the five state-of-the-art methods for drug-disease association prediction when network-CS is used: (a) ROC curves on the drug-side, (b) Precision*recall curves on the drug-side, (c) ROC curves on the disease-side, and (d) Precision*-recall curves on the disease-side.

Figure 4 .
Figure 4.The distributions of AUC values from 10 folds when the network density thresholds are 2%, 3%, 4%, and 5%.Drug-disease associations were predicted on the drug side with network-ATC by DRAW.The highest median AUC was achieved when the density threshold was 4%.

Table 2 .
Accuracy comparison for drug-disease association prediction by 10-fold cross-validation when network-ATC is used.The highest score in each evaluation category is in bold.

Table 3 .
Accuracy comparison for drug-disease association prediction by 10-fold cross-validation when network-CS is used.The highest score in each evaluation category is in bold.

Table 4 .
The top 10 candidate diseases to be treated by doxorubicin.

Table 5 .
The top 9 candidate diseases to be treated by gabapentin.

Table 6 .
The top 9 candidate diseases to be treated by levodopa.

Table 7 .
The top 8 candidate diseases to be treated by flecainide.

Table 8 .
The top 10 candidate medications to treat type 2 diabetes mellitus.

Table 9 .
The top 9 candidate medications to treat Ischemic stroke.

Table 10 .
The top 10 candidate medications to treat Alzheimer's disease.

Table 11 .
The top 10 candidate medications to treat Parkinson's disease.